Introduction to Convolutional Neural Networks

We have been using fully connected networks (FCNs) to classify the MNIST dataset, and in the last assignment we designed a network which could do this with an accuracy of around 98%.
Convolutional Neural Networks, or Convnets, or CNNs, are another even more powerful tool for classifying images such as MNIST. You might ask, what do Convnets do that FCNs can't?

To understand this, let's directly compare FCNs and CNNs on the task of classifying MNIST data.

Pull in the MNIST data

Build a Fully-Connected Network (FCN) and train it

We will build a simple 1-hidden-layer network. We will use 400 hidden nodes since that was close to optimal based on our earlier studies.

NOTE: We will then save the network for later use.

Building a Convolutional Neural Network (Convnet or CNN)

Lets try to build a CNN to classify MNIST images.

This is based on the network in the notebook: how_cnn_works.ipynb in this directory.

NOTE: We will then save the network for later use.

A method to get performance numbers

The following method will be helpful later to get loss, accuracy, and the confusion matrix for our network.

We can use this for both the FCN as well as the CNN.

Load the saved FCN and CNN networks

Since we trained the networks in this same notebook, we don't really need to do this step, but this shows you how to do it.

Comaprison Number 1: Overall performance

We will use out "getPerformance" method. Are the networks similar?

Comparison Number 2: How big are the networks?

Keras gives us a tool to get summary information about our network:

Calculating the Number of parameters

Notice that the two layers are called dense: these are fully connected layers, meaning there is a connect from every input to every output. Here is how we get the paremeters:

  1. dense_3 (the "_3" is just a label from when the network was created, it has no significance): We have 784 inputs each connected to 400 hidden nodes: 784*400=313600 parameters, plus another 400 "bias" parameters (1 for each node) which gives us a total of 314,000 parameters for the hidden layer.
  2. dense_4: we have 400 inputs (one each from the hidden layer) connected to 10 outputs: 400*10 + 10(boas)=4010 parameters for the output layer.

So we have 318,010 total paremeters for a network which is used to classify small 28x28 greyscale images. If we went to megapixel color images, we would have 3x1000x1000 = 3,000,000 input pixels, and if we have a 400 node hidden layer (which is probably too small), we end up with more than 1.2 billion parameters.... this does not scale!

Assignment Task 0: Calculate FCN Parameters for a Large Color Image

If we cave a 1000x1000 greyscale image, this is a megapixel image (1000x1000=1,000,000 pixels). If this is a color image, then each pixel has 3 possible values (corresponding to RGB - red,green,blue), for a total of 3,000,000 pixels. If we were to classify color images of this size of digits (10 classes total) this with a single hidden layer of 400 nodes, how many parameters are needed?

3 million pixels means we have an input of 3 million

3 million * 400 hidden nodes

Each hidden node has a bias node = 400

400 hidden nodes * 10 classes/outputs

Each output node has a bias node = 10

Comparison Number 3: Sensitivity to Variations in the Input

The types of variations I want us to consider include:

  1. Shifts of the input image (up/down and/or right/left).
  2. Scaling of the input image (making it bigger or smaller (still within the 28x28 pixel window)
  3. Rotations of the input image We could also include shearing of the input image, but for now lets just consider the first 3.
    Keras includes a method for performing all of the operations on an image. The method is described in detail here:

Let's define a method to do this, using a single image as an input, and also define a method to display the image:

Assignment Task 1: Pick a random image from our test dataset, and do the following:

  1. Plot the image (make sure is one that is easy to see rotations of).
  2. Rotate by 45 degrees and plot result.
  3. Rotate 45 plus zoom out (make the digit smaller) and plot result.
  4. Rotate 45 plus zoom out plus translate the image to the upper corner, and plot result.

Compare Stability of the Networks to Variations in the input

We are now ready to systematically answer the question: how well does the FCN vs the CNN handle images that are slight (or not-so-slight) variations of the data it was train on.

Here is what we will do:

  1. Loop over every image in the test set
  2. Choose a random +/- shift (in x and y) from a subset (0-4 pixels in increments of 1).
  3. Randomly shift the image over that shift.
  4. Store the image in a list

When we are done, we will run that list of images through our original FCN and CNN and note the performance, comparing it to the default.

Compare FCN and CNN with a Chart and a Plot

Shortcomings of FCNs

We see that standard, fully-connected neural networks, although powerful, have some clear shortcomings when applied to image classification:

  1. They do not scale well. Reasonable-sized images would require an enormous number of parameters. This in turn would require a corresponding increase in the number of training samples in order to determine the parameters accurately.
  2. They are more dependent than CNNs on the specific pixel relationships within the image. Performance degrades substantially as soon as there is a minor devation from these relationships.

Both of these issues are related: the FCN does not take advantage of the fact that - generally - in image classification, the images tend to be built from underlying common features. In the case of MNIST images, these are the curves and lines and corners which make up the individual digits. Convnets attempt to take advantage of these features.

Other Points of Comparison of CNNs and FCNs

There are a couple of things to notice when comparing the output from the code blocks above:

  1. The performance of the CNN is better than the FCN after 5 epochs. A careful examination of the training set accuracies reveals that the CNN is still undertrained (and so can perform better if we increase the number of epochs).
  2. The number of parameters needed to specify the CNN is 40.3k, about 7 times smaller than the FCN!
  3. The training time per step is much longer (about 10x) for the CNN than it is for the FCN.

Assignment Task 2: Compare Rotations

Using the same basic structure as above, compare how FCNs and CNNs perform on rotations.

Use the starter code below. You will need to randomize whether the roation is clockwize or counterclockwise.

The output should be just like above for the shifts:

  1. A table comparing FCN and CNN for the 5 rotations
  2. A plot comparing the two.

Assignment Task 3: Compare Zooms

Using the same basic structure as above, compare how FCNs and CNNs perform on zooms.

Use the starter code below. You will not need to randomize the zoom.

The output should be just like above for the shifts:

  1. A table comparing FCN and CNN for the 5 zooms
  2. A plot comparing the two.